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Shannon's celebrated formula W ln(l + P /N W) for the capacity of 
a time-continuous communication channel with bandwidth W cps, average 
signal power P , and additive Gaussian noise with flat spectral density 
No has never been justified by a coding theorem (and "converse"). Such a 
theorem is necessary to establish W ln(l + P /N W) as the supi-emum of 
those transmission rates at which one may communicate over this channel 
with arbitrarily high reliability as the coding and decoding delay becomes 
large. 

In this paper, a number of physically consistent models for this time- 
continuous channel are proposed. For each model the capacity is established 
as W ln(l + P /N W) by means of a coding theorem and converse. 

I. INTRODUCTION 

As an idealized model for the time-continuous Gaussian channel (with 
bandwidth W cycles per second, two-sided noise spectral density N„/2, 
and average power P„), Shannon 1,2 employed the mathematical time- 
discrete channel which passes 2W real numbers x per second, with the 
average of x 2 restricted to be P . Each input x is perturbed by an inde- 
pendent "noise" random variable which is Gaussian with mean zero 
and variance N„W. If by "channel capacity" we mean the maximum 
rate at which a channel is capable of transmitting information with 
arbitrarily small error probability as the coding and decoding delay 
becomes large, then the capacity of this time-discrete channel is given 
by the celebrated formula W log 2 (1 + P /N W) bits per second (or 
W In (1 + Po/NoW) nats per second). 

In order to show that the capacity is given by this formula, it is 
necessary to prove a coding theorem (showing the possibility of achiev- 
ing "error-free" communication at any rate less than W log2 (1 + 
P /N W)), and a "converse" (showing the impossibility of achieving 
"error-free" coding at a rate exceeding this quantity). For this — purely 
mathematical — channel these theorems have been proved, and there 
is no question as to the meaning and validity of the capacity formula. 
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The way in which Shannon arrived at this time-discrete model for a 
"physical" time-continuous channel is described in detail in Section II. 
It will suffice to remark here that there remain questions as to the relation 
of this time-discrete model (and the resulting capacity formula) to a 
physically meaningful time-continuous channel. These difficulties center 
on the fact that the inputs and outputs of the time-continuous channel 
are band-limited signals which are not physically realizable. As we shall 
see in Section II, such assumptions lead to a number of anomalies and 
absurdities. 

Our purpose in this paper is to find physically consistant mathe- 
matical models for the time-continuous band-limited Gaussian channel, 
and to establish their capacity by means of a coding theorem and con- 
verse. Schematically our results are of the following form: 

Let a(T,W,P ) be a class of functions which are "approximately 
band -limited to W cycles per second and approximately time-limited to 
T seconds", and which have "average power" P . The channel inputs 
must be members of a. The noise is additive, stationary, and Gaussian 
with flat two-sided spectral density N /2 in the band — W cycles per 
second (or "approximately" given as above). Then the channel capacity, 
defined as the maximum rate for which arbitrarily high reliability is 
possible (using signals from a) as T becomes large, is given "approxi- 
mately" by W log 2 (1 + Po/NoW). The term "approximately" used 
here will, of course, be given a precise meaning below. 

In Section II, Shannon's model and results are discussed, and in Sec- 
tion III our models and results are stated completely and discussed. 
Our proofs follow in Sections IV and V. A glossary is included at the 
end of the paper. 

II. THE SHANNON MODEL 

2.1 The Time-Discrete Channel 

In order to fix ideas as well as to review some results which will be 
required subsequently, let us consider the following class of (time-dis- 
crete) channels: Every T seconds the input to the channel is a sequence 
of n = [aT] real numbers x = fa ,x 2 , ■ ■ • , x n ), where a(0 < a ^ co ) 
is a fixed parameter. Further, the input sequence must satisfy the 
"energy" constraint 

E(x) = J2x k 2 <PT, (1) 

where P > is another fixed parameter, and where E(x) is, as indicated, 
the sum of the squares of the components of x. 
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The channel output is also a real w-sequence y = (yi >y% , • • • , y»), 
where 



Vk = x k + z k , k = 1,2, 



n. 



(2) 



and the noise digits z k (k = 1,2, • • ■ , n) are independent, normally 
distributed random variables with mean zero and variance N. 

Let us assume that this channel is to be used in the communication 
system of Fig. 1. The output of the message source is a sequence of 
independent and equally likely binary digits which appear at the input 
of the coder at the rate of Rb digits (bits) per second. Every T seconds 
the coder input is one of M = 2 RbT binary sequences, each sequence 
being equally likely. Let us number the possible messages as 1,2, • • • , M . 
The coder contains a mapping of the message set {1,2, • • • , M\ to a 
set (called a code) of M real n-sequences {xi ,x 2 , • • • , x M \ (called 
code words) satisfying (1). If message i(i = 1,2, • • • , M) is the coder 
input, then the coder output (and hence channel input) is the code 
word x, . Since it takes T seconds to transmit a code word, the system 
can process information continuously without a "backup" at the coder 
input. The transmission rate is Rb bits per second or R = (In 2)Rb 
nats per second. 

It is the task of the receiver (or decoder) to examine the received 
sequence y, and determine which of the M code words was actually 
transmitted. Thus, we may think of the decoder as a rule which assigns 
to each possible received sequence y, a code word x, . Let us denote by 
P ei the probability that the decoder chooses the wrong code word given 
that x,- was transmitted. The over-all error probability is then 



1 M 

M i=[ 



(3) 



A transmission rate R (nats per second) is said to be permissible if 
for every X > one can find a T sufficiently large and a code with 
parameter T with M = [e RT ] code words and P e ^ X. With such a code, 
the system could process Rb = R/ln 2 bits per second. We define the 
channel capaxity C as the supremum of permissible rates. For the channel 
under discussion the channel capacity is given by the celebrated formula 
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Fig. 1. — Time-discrete channel. 
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C = C Q 



a l 

= 2 ln 



(• + *)• 



(4) 



In order to establish C as the capacity, one must prove two theorems. 
The first ("direct half") states that any R < C is a permissible rate; 
that is, there exist codes with vanishingly small P e as T — » °°. The 
second theorem ("weak converse") states that no R > C is a permissi- 
ble rate; that is, for any sequence of codes with rate R > C, P„ is 
bounded away from zero. This has been done for the present channel 
for the case of a finite a by Shannon. 1,2,3 Let us observe that if we let a — > 
oo in (4), we have C a ^>P/2N. The fact that C w = P/2N has been 
established by Ash. 4 The reader is referred to Ash [Ref. 5, Chapter 8] 
for a complete discussion of the above. The significance of the channel 
capacity then, is that it is the maximum rate for which arbitrarily high 
reliability is possible using signals in a certain class (i.e., those which 
satisfy (1 ) ) with sufficiently long delay T. 

2.2 Application to the Band-Limited Gaussian Channel 

Shannon 1,2 has applied the above results to the communication system 
of Fig. 2. As above, the message source emits binary digits at the rate of 
R b per second, and after T seconds, one of M = 2 RbT possible messages 
appears at the coder input. Corresponding to the ith. message (i = 1,2, 
• • • , M ) the coder output is the function 

Xi(t) = J2x ik 8(t- k/2W), (5a) 

where 8(t) is the unit impulse, n = [2WT], and the [xa]h^ n satisfy 

E ** ^ 2WP T, i = 1,2, • • •, M. (5b) 

/t=i 

As for the time-discrete channel, the coder must contain a set of M real 
rc-sequences. The channel input s, (t) is the result of passing .r, (t) through 
an ideal low-pass filter with transfer function 
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Fig. 2 — Shannon's time-continuous band-limited channel. 
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so that 

fA v [s m2TW(t - k/2W) ~\ (7 , 

Si(t) = h Xik L 2rW(t - k/2W) J • (?) 

Thus, it takes T seconds to generate the filter input, and the system can 
process information at a rate of R = (In 2)R b nats per second without 
a "backup" at the coder input. Let us also remark that although the 
signal Si(t) is generated in T seconds, due to the physical unrealizability 
of H(o)), s,(0 is nonzero almost everywhere on (— «>,«>). This leads to 
a fundamental difficulty which we shall discuss later. 

Let s(t) be the input to the channel due to a repeated application of 
the coding process (every T seconds). Then s(t) is bandlimited to W 
cycles per second, and 

limit ±- / s 2 (t)dt ^ P . (8) 

Inequality (8) follows from (5b) and the orthogonality of 

sin 2irW (t - k/2W) , sin 2rW(t - k'/2W) 

2wW(t - k/2W) 2wW(t - k'/2W) 

( — oo < k < k < oo ) on the infinite interval (— oo , oo ) . Thus, the 
channel input is a bandlimited signal with "average power" not exceed- 
ing P . 

Again turning our attention to Fig. 2, the channel output is a function 
y(t) = s(t) + z(t), where z(t) is a sample from a Gaussian random 
process with spectral density 

,.. . (No/2 H ^ 2tW, m . 

N ^ = { M > 2rW. (9a) 

The corresponding autocorrelation function of the noise is 

R(r) = Z[z{t)z{t + r)] = NoW ^~^ , (9b) 

where S denotes expectation. 

Again it is the function of the receiver (or decoder) to examine y(t) 
and determine what the input information was. Let us consider the 
signal s,(0 (7), which was generated during the interval [0,T]. The 
coefficients {#,•*} *-i" are the values of Si(t) at the "sampling instants" 
t = k/2W, k = 1,2, • • • , n. Since the noise is also bandlimited, the 
received signal y(t) is bandlimited and may be completely characterized 
by its values at the sampling instants yk — y(k/2W), k = 
0, ±1, ±2, ••• . Clearly 

Vk = Xik + z* , k = 1,2, ■■■ ,n, (10) 
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where z k = z(k/2W) is the value of the noise z(t) at the sampling in- 
stant t = k/2W. Since s t (k/2W) = 0, for A: < 1 and k > n, the only 
useful samples of y are [yk\k=i"- Further it follows directly from (9b) 
that the z* are independent, normally distributed random variables 
with mean zero and variance N W. Thus, it suffices to consider the 
input and output as n-sequences x, = (.r,i , Xa , • • • , x in ) and y = 
(yi , • ■ • , Vn) (n = 2WT) related by (10). Let us remark here, that the 
code words corresponding to previous and successive intervals will 
not cause any interference with the code word corresponding to the 
interval [0,T], since these other code words are zero at the sampling 
instants. 

Inequality (5b) and (10) permit us to apply the results for the time- 
discrete Gaussian channel discussed above with parameters a = 2W, 
P = 2WP , and N = N W. We conclude that this communication 
system (in Fig. 2) is capable of processing information at any rate R 
less than 

C=Pfln ( 1+ A^)' (U) 

with vanishingly small error probability as T becomes large. Since the 
channel inputs are bandlimited to W cycles per second, and by (8) 
have average power not exceeding P , it is generally believed that the 
capacity (taken as the maximum "error-free rate") of a channel which 
admits only bandlimited signals with average power P is given by (11). 
In fact, it has only been shown that it is possible to do at least as well 
as C (using the system of Fig. 2), and no converse has been proven. This 
is the first difficulty with the Shannon model which we shall attempt to 
remedy. 

Further, there are other difficulties inherent in the use of this model. 
We are taking "capacity" to be a (maximum) transmission rate, but 
what is the rate for the system of Fig. 2? We have said merely that the 
coder can process information at a rate of R nats per second. However, 
because of the physical unrealizability of H(oi), we must discard all 
temporal notions about the channel input s,(0 as well as the output 
y(t). The notion of rate, therefore, has only a limited meaning. In fact, 
since the received signal y(t) is an entire function, it is perfectly pre- 
dictable for all time from observations over a finite interval. Thus the 
receiver, by observing y(t) in a tiny interval, could extrapolate y(t) 
for all time and obtain sample values at an arbitrarily high rate. This 
anomaly is the second difficulty with the Shannon model. 

It is the purpose of this paper to present a model for the time-con- 
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tinuous band-limited Gaussian channel for which the capacity (defined 
as the maximum "error-free rate") is given by (11 ). This will necessitate 
proving a "direct half" and "converse" to a coding theorem. Further, 
the model should avoid the second difficulty mentioned above. We shall 
obtain results of the following form: 

Let a(T,W,P ) be a class of functions which are "approximately 
bandlimited to W cycles per second and approximately time-limited 
to T seconds", and which have total "energy" not exceeding P T. 
The noise is taken to be stationary and Gaussian with spectral density 
given (or "approximately" given) by (9a). Then the channel capacity, 
defined as the maximum rate for which arbitrarily high reliability is 
possible (using signals from a) as T becomes large, is given "approxi- 
mately" by W In (1 + P o /N W). The term "approximately" used 
here will, of course, be given a precise meaning below. 

III. SUMMARY OF RESULTS 

We shall propose four models for the channel and find the capacity 
of each. Each model is of the following form: 

(i) Definition of a suitable class of allowable signal functions, 
a(T,W,P ), which are "approximately bandlimited to W cycles 
per second, approximately time-limited to T seconds", and with 
total energy not exceeding P a T. 
(ii) Definition of the noise — taken to be stationary additive Gaus- 
sian noise with spectral density N(w), which is "approximately" 
given by (9a). 
We shall take W and P„ to be fixed parameters. A code with parameter 
T is a set of M functions (called code words) in a(T,W,P a ). The transmis- 
sion rate R is defined by R = (1/T) In M, so that M = e RT . A decoding 
scheme is a mapping of the space of possible received signals (code word 
plus a noise sample) onto the code. If code word i (i = 1, 2, • • • , M) 
is transmitted, we take P PI to be the conditional probability that the 
decoder chooses a code word other than i, and hence makes an error. 
Since all code words are equally likely to be transmitted, the over-all 
error probability P e is given by (3), i.e., 



p< = ± Z p.. • 

A transmission rate R is said to be ■permissible, if for every X > one 
can find a T sufficiently large and a code with M = [e RT ] code words for 
which P c ^ X. The channel capacity C is defined as the supremum of 
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permissible rates. We shall find the capacity corresponding to a number 
of different a(T,W,P„) and JV(«). This will, as for the time-discrete 
channel, necessitate proving two coding theorems — a "direct half" 
and a "weak converse". 

Before beginning the summary we shall need the following definitions. 
Let s(t), — oo < t < oo, be a real-valued square-integrable function 
and S(u>) be its Fourier transform. Let the norm oi s(t) be 

|| a || = [£«'«)*]*. (12) 



K b (s,2tW) = i f' TW | S(o) |Wll 8 H 2 > (13a) 



The frequency and time "concentration" of s are 

and 

K D (s,T) = f T/2 s 2 (t)dt/\\s\\ 2 , (13b) 

J-T/2 

respectively. Further, let D T be the "time-truncation" operator defined 
by 

^ rS " \ U| > T/2. U ; 

With these definitions in hand, we are able to state our results. In each 
case we shall define the channel model and then give the channel ca- 
pacity. Although there are some difficulties inherent in these models, 
each model leads to a mathematical theorem which justifies Shannon's 
capacity formula. 

Model 1: To begin with, let us take for the set a of "allowable" inputs. 
ai(T,W,P„), the set of functions s(t) satisfying 

s(0 = 0, 1*1 > T/2, r (15a) 

||s II 2 ^ PoT, (15b) 

K b (s,2ttW) ^ 1 - v (0 < v < 1). (15c) 

Hence, our allowable signals are functions which are strictly time-limited 
and approximately band-limited. As t? — » 0, the allowable signals become 
more perfectly bandlimited. The noise spectrum is taken to be 
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" (m) = W./2 |.|> 2*W, (10) 

where 0< v ^ 1. As i> — * 0, (16) is in some sense "approximately" the 
same as (9a). The average noise power outside the band (| o> | > 2irW), 
however, is infinite. In this case, Theorem 3 establishes 



C= C, - Wln[l + (1 -^)t7^) + ^ (17) 



Po \ Po 



as the channel capacity. As 77 — > 0, the capacity approaches the classical 
formula W In (1 + P /N W). 

The principal difficulty with this model is the assumption of infinite 
average noise power, which is hardly a physically acceptable notion. 
Further, there are mathematical difficulties inherent in a spectral 
density given by (16) which implies a covariance containing an impulse 
function. Often the assumption of a spectrum in (16) can be justified 
by the fact that it can be approximated as closely as desired in the 
frequency range of interest by a spectrum with finite power. However, 
the following theorem, the proof of which is Appendix B, renders this 
justification meaningless in this case. 

Theorem 5: Let a(T,W,P ) be as in (15) and let the noise be additive and 
Gaussian with spectral density N(u), where 



f 



N(u)du) < 00. 



Then the capacity C v = =o regardless of how small 7/ may be. 

Intuitively, we may see that this is true by observing that, since the 
above integral exists, N(u) must be arbitrarily small in some frequency 
range. Hence, by placing some signal energy into this frequency range, 
we can make the "signal-to-noise" ratio arbitrarily large, and therefore, 
the permissible rate of transmission arbitrarily high. 

Accordingly, we shall assume for the remaining models that the noise 
is additive, Gaussian, with spectral density 

(No/2 I o,| ^ 2rW t 

N(u) ~ \ |« I > 2*W. (18) 

This corresponds more closely with the usual formulation of a band- 
limited channel. It remains to find a suitable class of input signals, 
a{T,W,P ). We consider some possibilities. 

Model 2: This model defines a = a 2 (T,W,P ) as the set of functions s(t) 
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satisfying 

S(u) = 0, | co | > 2wW, (19a) 

||s II 2 ^ PoT, (19b) 

K D (s,T) ^ 1 - r, (0 < v < 1). (19c) 

Thus, a 2 is a set of strictly band-limited, approximately time-limited 
functions. As 77 — * 0, the allowable signals become more perfectly time- 
limited. With the noise as defined in (18), Theorem 2 establishes 

C=C„= Wln(l + (1 - v) + H~ (20) 

as the channel capacity. Again, as rj — > 0, C, approaches the classical 
formula IF In [1 + (PJN.W)]. 

Model 2 is an intuitively plausible model for the band-limited channel, 
and Theorem 2 which establishes its capacity is a mathematically rigor- 
ous result which, in the limit, yields the desired capacity formula. There 
are, however, two difficulties inherent in this formulation. The first 
is that since the allowable signals s (I ) are band-limited, it is not possible 
to generate them in finite time. Thus the central idea of a transmission 
rate has, at best, a limited meaning. The Shannon model (Fig. 2) also 
suffers from this difficulty (see Section II). The other problem with 
this formulation is that if code words are transmitted sequentially, 
we will have an interference problem (i.e., the tails of successive signals 
will overlap), the resolution of which is not known at present. The 
following two models contain neither of these difficulties. 

Model 3: This model avoids the difficulties of Model 2 by letting the 
code words be strictly time-limited and approximately band-limited. 
However, as we have seen in Theorem 5, the definition of approximately 
band -limited functions employed above (15) yields an infinite capacity. 
Thus we seek an alternate way of characterizing "approximately" 
band-limited or "slowly changing" functions. We proceed as follows. 
Let s(0 be a function satisfying x{t) = 0, | t \ > T/2, and || x || 2 < 00. 
If x = D T x, where x is a strictly bandlimited function and D T is defined 
by (14), we may define a "frequency concentration" of x by 

II x II 2 
Km\xfrW) = h&. (21) 

If we cannot express x as D T x, we take K B = 0. For example, if x(t) 
or any of its derivatives has even a small discontinuity then we cannot 
write x = D T x, so that K b '(x,2tW) = and x is not approximately 
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bandlimited in this sense. This is so no matter how large K B (x,2irW) 
may be. Conversely, it is shown in Appendix C that for any function x 

^"^-'jA iaf ' (22) 

so that a K B close to unity implies a K B close to unity. Thus, saying that 
a function x has a Kb' close to unity implies that x is "slowly changing" 
and that K B is also close to unity. 

We now choose that set a = a 3 (T,W,P ) of allowable inputs as the 
set of functions s(t) for which 

s(t) = 0, \t\ > T/2, (23a) 



s 



|2 



^ PoT, (23b) 



K fl '(s,27rttO ^ 1 - r, (0 < i? < 1). (23c) 

Thus a 3 is a set of strictly time-limited, and approximately band -limited 
functions. In this case, Theorem 4 establishes 

C _ C ,_ lfln ( 1+ .*_) + _2_£ : (24) 

as the channel capacity. Again C„ — * W In [1 + (P /N W)] as 17 — * 0. 

The significance of constraint (23c) is that it makes it impossible 
for the communicator to make any use of the high-frequency components 
which must of necessity be included in the signal (since it is time- 
limited). Model 3, therefore, provides a mathematically rigorous 
theorem which does not involve any complications concerning physical 
realizability, and yields the desired capacity. 

Our final formulation is as follows: 

Model 4: Let a = ai{T,W,P ) be the set of strictly time-limited, 
approximately band-limited functions s(t) which satisfy 

»(*) = 0, \t\ £ T/2, (25a) 

I! s || 2 =2 PoT, (25b) 

K b (s,2tW) ^1-77. (25c) 

Now Theorem 5 (stated above) tells us that if the noise were as in 
(18), then the capacity is infinite. In actuality one could not be sure that 
the noise was absolutely band-limited. In fact, whether or not the noise is 
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strictly band-limited is not verifiable in the laboratory. It is reasonable, 
therefore, to assume that the noise is given by z(t) = Zi(t) + z»(t), 
where Z\(t) is a sample from a Gaussian random process with spectral 
density (18). For z 2 (t) we require only that 

/r/2 
z 2 \t)dt ^ vN WT, (26) 

172 

where v > is small. We place no other restrictions on the spectrum of 
z 2 or on its probability structure. Since the expected value of the energy 
of z,(0 in [-T/2,T/2] is N„WT, (26) implies that the energy of z(t) 
is nearly all in Zi(t) (v « 1). We shall assume that z 2 (t) may depend 
on the code and decoding rule used, on the code word transmitted, 
and the sample Z\(t). We require our communication system to perform 
well no matter what z 2 {t) may be. 

Let us say that a code (satisfying (25)) and a decoding rule have 
been chosen. Let us also assume that the rule for selecting z 2 (t) has 
been chosen. Let P e (z 2 ) be the resulting error probability. Then define 

P e = nrnxP e (z 2 ), (27) 

where the maximization in (27) is over all rules for choosing z 2 {t) — 

with the code and decoding rule fixed. The channel capacity is the 

supremum of those rates for which P e may be made to vanish as T — > oo . 

It can be shown (see Appendix D) that the capacity C is given by 



i 1 + m) 



C - C w = W In ( 1 + Jgs ) + e(v)i (28) 



where e{n,v) — > as rj,v —> provided v/ij > P /N o W, the signal-to- 
noise ratio. Since we may consider ij and v to be limits on the accuracy 
of our measuring equipment, the former on measuring the signal* and 
the latter on measuring the noise, it is reasonable to assume, as we did 
in (28), that rj and v go to zero at the same rate. 

An alternate and mathematically equivalent formulation of Model 
4 is as follows: Let the signals s(t) be as in (25) and the noise z{t) be 
as in (18). Now in reality one could not expect the decoder to be 
capable of infinitesimally accurate measurements. It is reasonable, 
therefore, to assume that there is an inherent uncertainty in all measure- 
ments made by the decoder, and to require that the communication 
system perform well despite this uncertainty. Specifically, we require 
that the decoding regions satisfy the following condition: If y x {t) is 
decoded as s,- , and y 2 (t) is decoded as Sj(i t* j), then 

* I.e. t rj represents a limit on the measurement of the frequency component of 
the signal outside the band. 
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r T/2 

(2/i(0 - 2/ 2 (0 ) 2 dt ^ 2vN WT. (29) 

J-T/2 

In other words, if a received signal y (t) is close to the "border" between 
decoding regions, we cannot, because of the uncertainty in the accuracy 
of our measurements, be sure to which region y(t) belongs. Condition 
(29) forces the decoder to give up on such a y(t) and to announce an 
error. The capacity for this alternate model is also given by (28). Let 
us remark that here v is again a measure of the accuracy of our measuring 
instruments, this time at the decoder, so that again it is reasonable to 
expect 77 and v to tend to zero at the same rate. 

IV. PRELIMINARIES TO PROOFS 

4.1 The Product of Time-Discrete Channels 

The product or parallel combination of r time-discrete Gaussian chan- 
nels is denned as follows. Every T seconds the input to the channel is 
an r-tuple (x (1) , x (2) , • • • , x (r) ), where 



x 



= fe^a/V--,*^*) (*- !>2, ■■•,!■) 



is a real n»-vector (n, = a { T, a, a fixed parameter). Each vector x (,) 
satisfies the energy constraint 

E[x li) ] = E ferf'T ^ PiT, i = 1,2,- • • , r, (30) 

A-=l 

where the P, > are fixed parameters. The channel output is also an 
r-tuple (y (1) , • • • , y {r) ), where the y (,) are Wf-vectors given by 

y (o = x (o + % <* t (31) 

where the z (,) are rif-vectors whose coordinates are independent Gaus- 
sian random variables with mean zero and variance Ni(i = 1, 2, • • • , r). 
Further, the {z (,) }i =1 are statistically independent. Codes, permissible 
rates of transmission, and channel capacity are defined as in Section I. 
The following is proved in Ref. 6. 

Lemma A: The capacity C of the product of r time-discrete Gaussian 
channels, with parameters (a,- , P,- , iV,-), i = 1, 2, • • • , r, is given by 
the sum of the capacities of the component channels: 

C = £1'K 1 + ^)- (32) 
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Equation (32) also holds when one or more of the at = <x> . In this case 
we read x In [1 + (c/x)] \ x -*> as c. 

4.2 The Jointly-Constrained Product Channel 

We define the jointly-constrained product of time-discrete channels 
exactly as the ordinary product with constraint (30) replaced by con- 
straints of the following form : 

Type 1: Let r = 2 and Ni = N 2 = N and instead of (30) we have 

E(x) = E(x (1) ) + E(x l2) ) ^ PT. (33a) 

If «i ^ a 2 we introduce an additional constraint on x (2) 

#(x (2) ) ^ vE(x) (33b) 

where $ (0 ^ rj ^ 1 ) is another fixed parameter. In other words, we have 
constrained the total energy of the two input vectors (33a), and intro- 
duced another constraint on the second input vector x (2> requiring it 
to have no more than rj of the total energy (33b). If a 2 ^ ai , we replace 
(33b) by a similar constraint on x (1) . 

Type 2: Let r = 3, Ni = N 2 , and iVi ^ N 3 . Further, let a 3 = » . 
Instead of (30) we require that x satisfy 

E(x) = E(x {1) ) + E(x {2) ) + E(x (3) ) ^ PT, (34a) 

E(x (3) ) ^ r)E(x). (34b) 

This is a special case of type 1 when a 2 = 0, Ni = Ns . 

Type 3: Let r = 2,Ni = N 2 = N, and a 2 = » . Instead of (30) require 
x to satisfy 

E(x w ) ^ PT, (35a) 

E(x (2) ) ^ r}E(x). (35b) 

We now ask what is the capacity C of these channels? The answer is 
the following theorem which is proven in Appendix A. 

Theorem 1 : The capacity C of the jointly-constrained product channel as 
defined above is 

Type 1 {r = 2,Nx = N 2 = N): 

C = Cx((l -fi)P) + C 2 (/3P), (36; 

where 



and 
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8m ii L ff l -_2-Y (37a) 

CM = f In (l + ^) , i = 1,2. (37b) 

Again when a, = oo , we interpret .r In [1 + (c/#)] | *-« = c. In particu- 
lar, when a 2 = °° («i < °° ), = tj, so that (36) implies that we can 
do no better than putting as much energy into Channel 2 as (33b) 
will permit. 

Type 2 (r = 3, JVi = N* ^ AT 3 , a 3 = <x> ) : 
C= 2 ln l 1 + (« 1 + aW 



,a 2 . / ( 1 - q)P \, _, . P 

+ 2 In l 1+ (a 1 + a 2 )ivJ + T? 2^ 



*i + a 2 )Nj 
Type 3 (r = 2, N t = N 2 = N, oc 2 = =o ) : 

'"f^ + ^ + G^W 



(38) 



(39) 



4.3 Prolate Spheroidal Wave Functions 

The following material can be found in Ref. 7. Given any W,T > 
we can find a countably infinite set of real functions hfc(0}<-t"i called 
prolate spheroidal wave Junctions (PSWF), and a set of real positive 
numbers 

1 > Xi > A 2 > • • • (40) 

with the following properties:* 

(i) The if/i(t) are bandlimited to W cycles per second, orthonormal 
on the real line, and complete in the space of bandlimited functions of 
bandwidth W cycles per second. 

(») The restrictions of the f<(i) to the interval [-7/2, T/2] are 
orthogonal : 

7/2 [X» * = j, , N 

fr(t)+i(t)dt = . (41) 

r/a 10 i^j. 



I 



* Note that the first PSWF is fi(t). In Ref. 7, on the other hand, the first PSWF 
isfoU). 
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The restrictions of the fc(t) are also complete in £ 2 [—T/2,T/2], the 
space of square integrable functions on [—T/2,T/2]. 
(Hi) For all t, the faff) satisfy the integral equation 

KUt) = l T %M siD y 5) ds. (42) 

Thus the X,- are the eigenvalues, and the \f/{ the eigenf unctions of the 
integral equation (42). It follows immediately from (42) that the time- 
limited functions D T \f/i (see (14)) have frequency concentration (see 
(13a)) 

K b (D t ^,2tW) = \ i} *-l,2, •••. (43) 

It can be shown that the X, and \f/i depend upon W and T only through 
the product WT. Further, 
(iv) For a fixed 5 > 0: 

Mwt(i-&) — » 1 as WT — > =o (44a) 

and 

\*wtq+8)->0 as WT-+*. (44b) 



Thus roughly speaking, for large WT, approximately 2WT of the X, 
are approximately unity, and the remainder are approximately zero. 

4.4 Karhunen-Loeve Expansion 

Let z(t) be a Gaussian random process with spectral density N(w) 
given by (18). Then, using the Karhunen-Loeve Theorem 8 , we may 

write z(t) as 

«tt)-E ***(<), -|i*^5, (45) 

where the ^*(<) are PSWF's, and the z k are independent random varia- 
bles which are normally distributed with mean zero and variance N /2. 
The sum in (45) converges to z(t) with probability 1 for every t. 

If N(co) is given by (16), then we may formally represent z(t) by 

fc=1 VXt 2 z 

where the X* are the eigenvalues of the PSWF's (40), and the z k are 
independent normally distributed random variables with mean zero 
and variance 
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2x No 



S(«T) = ^ [X*U - w) + r]. 

Thus from (44) roughly speaking, for large FPT, approximately 2IFjT 
of the Zk have variance iV /2, and the remainder variance vN /2. 

V. PROOFS OF THE THEOREMS 

The general ideal of the proofs in this section is as follows. All the 
time continuous input signals (i.e., members of a(T,W,P ) can be writ- 
ten in a Fourier series in PSWF's in which, roughly speaking, the first 
2WT terms correspond to the part of the signal which is simultaneously 
approximately confined to the frequency band | co | ^ 2irW and to the 
time interval | / 1 ^ T/2. The noise sample z(t) may also be written 
in a Karhunen-Loeve expansion in PSWF's. The result is to reduce the 
time-continuous channel into a jointly-constrained product of time- 
discrete channel (discussed in Section 4.2). Channel 1 corresponds to the 
first 2WT PSWF's so that the parameter ai = 2W. Channel 2 corre- 
sponds to the remaining PSWF's so that a 2 = °° • The energy require- 
ment on the time continuous signal || s j| 2 ^ PT yields a joint energy 
constraint for the product channels (as in (33a) for example), and the 
requirement that the energy outside the frequency band (or time- 
interval) be small yields a second energy constraint on the input to 
Channel 2 (as in (33b) for example). Application of Theorem 1 then 
yields the desired theorems. In the remainder of this section we shall 
make these ideas precise. 

We begin by establishing the capacity of the channel defined by Model 
2. 

Theorem 2: Let the allowable signal set be a 2 (T,W,P ), the set of functions 
s(t) satisfying 

,S» =0, |«| > 2tW, (47a) 

\\s\f^P T, (47b) 

K^sJ) ^ 1 - v(0 <v < 1). (47c) 

The noise is a sample from a Gaussian random process with spectral 
density 

v , v {N /2 |c*| 2S 2*W, , . 

N( " ] = 1*1 > 2,rTF. (48) 



Then the channel capacity is 



< (\ II i n (l + (l- n ) J^ + ij£. i'-» ! 



376 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1906 



Proof: 

(i) Direct Half: Let R be given satisfying 



R < Win 



1 + (!-*) ''- 



+ tS. <») 



N W_ 

Since the right member of (50) is continuous in t\ and W, we may find 
a 8 > and a > sufficiently small so that 

Pa 1 



R < W(l - 8) In 1 + (1 - v + a) 



N W(1 - 5). 



(77 - p-)P ££, 



We see from (36) that C* is the capacity of a type 1 jointly constrained 
product channel with parameters 

P = P , N = No/2, fj = Ti-a, ai = 2W{\-8), <* 2 =«>. (51) 

We now show how to construct codes for the time-continuous "channel' 
with rate R and with vanishingly small error probability (as T — > <» ) . 
Let x = (x (1) , x (2) ) be an allowable input vector for the type 1 time- 
discrete product channel with parameters given by (51). Then the 
corresponding input for the time-continuous channel is 

2w(i— S)r=o 1 r oo=a 2 r 

s(t) = £ Xk W fM + £ Xk i2) +k+ijni-*>r(t) (52) 

where the {^,(0}.=i°° are the PSWF's (Section 4.3) with parameters 
W and T. We first verify that signals of the form of (52 ) are allowable 
inputs, i.e., belong to a 2 (T,W,P ) and satisfy (47). That the s(t) are 
bandlimited and satisfy (47a), follows from the fact that the PSWF's 
have this property (Section 4.3). Further, the energy of s(t) satisfies 

II 8 || 2 = £ [zW + If [x k w f = E(x) ^ PT, (53) 

*=i fc=i 

where use has been made of the orthonomality of the PSWF's on (— °° , 
— 00) (Section 4.3 (i)), and the joint energy constraint on x (33a). 
Thus s(t) satisfies (47b). Finally, from the orthogonality of the PSWF's 
on [-T/2, T/2] (41), and the monotonicity of the X* (40) we have 

1 - K,(s,T) = II(1 7^ )S|12 

II S II 

= "g' 7 (1 ~^) Tfe(1)]2 + § (1 " ^™ + * } fc«T (54) 

- [1 ~ Wr<1 -' )] ~eW + ~eW ' 
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Now since hwni-i) rt 1 as T -* °° ( Ua )> and E(x a) )/E(x) ^ 1, 
with 2 1 sufficiently large we have 

, E(x w " 



[1 — X2ivr(i-«)1 



< 



E(x) 

Since E{x m ) must satisfy (33b) (with tJ = v - a), we have (with T 
sufficiently large) 

1 - K D (s,T) ?Z a + -q - a = r,, (55) 

so that s{t) satisfies (47c). Thus s(t) belongs to a 2 (T,W,P )- 
Now we may express the noise in a Karhunen-Loeve expansion as 

z{t) = E Z k W Mt) + E Zk^r+M, (56) 

where again the \f/ k are PSWF's and the {z k (l) \i^k<„ x=1,2 are independent 
normally distributed random variables with mean zero and variance 
N = N /2. The output signal y(t) = s(t) + z(t) is 

c 

where the ijk U) are obtainable by integration from the signal y(t). 
Further, 

y k (i) = x k (i) + z k {i \ (58) 

so that we conclude that our time-continuous channel with signals 
constructed in this way is equivalent to the type 1 jointly-constrained 
product channel with parameters (51) and capacity C (see Appendix 
E). Since R < C*, we may therefore construct codes with rate R for 
either channel with error probability P e — > as T — > » . This is the direct 
half of Theorem 2. 

(it) Weak Converse: Say we are given a sequence of codes for our 
time-continuous channel with parameters {7\},- = i°°, with code words 
belonging to a 2 (Ti ,W,P ) (as defined in (47)), with error probability 
P e {t \ and rate 



y(t) = E yi ll) Mt) + E y* l2) + aiT+k (t), (57) 



( 1 + (1 -^) + 'r - 



R > W In 1 + (1 - t,) 44- + V ^- (59) 



We shall show that P e {,) must be bounded away from zero so that the 
capacity C (the maximum permissible rate) cannot exceed the right 
member of (59). 

Now as in the proof of the direct half we may (by (59 ) ) find a 5 > 
and a > sufficiently small so that 
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B > W(l + I) In [l + (l - jJ-) y<y * + s) ] 

(60) 

J 1 ^° AC* 

^ 1 - a No = 

Again, as in the direct half, C* is the capacity of the type 1 jointly- 
constrained product channel with parameters 

p - p » *-*•* «-rh- ( 6i) 

ai = 2W(1 + 5), a 2 = oo. 

Now if s(t) is a code word from the code with parameter 7\- , (so 
that s £(h (Ti , TF, P ) ), we may write s (0 as a Fourier series in PSWF's 
(due to the completeness of the PSWF's on the space of band-limited 
functions) (Section 4.3), 

2HTi(l+S) <*> 

S(t) = S % (1 V*(0 + £ #* J ty*+2HTi<l+«)(0i 

*=i *-i (62) 

— oo < t < oo. 

Hence, to each code word s(t) for the time-continuous channel, there 
corresponds a vector x = (x (1) , x (2) ) whose coordinates are the coeffi- 
cients in the above Fourier series. We now show that x is an allowable 
input to the type 1 jointly-constrained product channel with parameters 
given by (61). From the orthonormality of the PSWF's on (—00,00) 
we have from (62), || s || 2 = E(x). Since s(t) e a 2 (Ti , W, P ), we have 
E(x) ^ PTi , so that x satisfies (33a). Further, from the orthogonality 
of the PSWF's on [-T/2, T/2] and the monotonicity of the X* we have 

1 _ k.m<) = ll(1 ;^r )5 "' 

II II 

2wt^ +S) [x a)Y(i- Xk ) ^ [ X «>]\i - \ 2WTi(1+S)+k ) 
~ H E(x) ~" t "fe E(x) K ° ' 

>n x i^ (x(2)) 

^ [1 — A2!V7' i (l+«)J 



E(x) 



With Ti sufficiently large (from 44b) we may put X 2l rr,(i+«) ^ <r, and 
since 1 — K D (s,Ti) ^ 77, 

E(x (2) ) ^ —^- E(x) = fjE(x), (65) 

1 — a 

so that x <2 satisfies (33b). 
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Finally, if we proceed as in the proof of the direct half of this theorem 
and express the noise in a Karhunen-Loeve expansion in PSWF's, 
we can conclude that for each code for this time-continuous channel 
we can obtain a code for the time-discrete jointly-constrained product 
channel with the same rate and error probability (see Appendix E). 
Since the rate R exceeds the capacity of the latter channel we conclude 
from the weak converse to Theorem 1 that the error probability is 
bounded away from zero. This completes the proof. 

The following theorems establish the capacity of the channels defined 
by Models 1 and 3. 

Theorem 3: (Model 1) Let the allowable signal set be fh(T,W,P ) the 
set of functions s(t) satisfying 

8(0-0, |f| 2 T/2, (66a) 

|| 8 || 2 ^ P T, (66b) 

K b (s,2tW) ^ 1 - v (0 < v < 1). (66c) 

The noise is a sample from a Gaussian random process with spectral density 

N ^ = \uN /2 (,*1) |.|>3rl* (07) 

Then the channel capacity is 

c = c,,= >ri n (i + (i-,)jy + ^. (68) 

Theorem 4-' (Model 3) Let the allowable signal set be a 3 (T,W,P ) the set 
of functions s(t) satisfying 

s(t) =0, |t| £ T/2, (69a) 

|| 8 || 2 ^ PoT, (69b) 

K B '(s,2ir\V) ^ 1 - v (0 < ij < 1), (69c) 

where K B ' is the frequency concentration defined by (21). The noise is as 
in Theorem 2 (48). Then the channel capacity is 

c = c, = Fm(i + ^) + r ^£. (70) 

Proofs of Theorems 3, 4: Since the proofs of Theorems 3 and 4 parallel 
that of Theorem 2 (which was given in detail above) we shall confine 
ourselves to a few remarks which will enable the interested reader to fill 
in the details on his own. 
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Theorem 3: In the direct half we consider, as in the proof of Theorem 2, 
a jointly-constrained product channel. In this case it is a type-2 channel 
with parameters 

a, = 2*F(1 - 8), a.1 = 0, a s = oo, p = P„ , 

Ni = -j (1 — V, N 2 = — , v = V ~ <r, 

where £,8,a > are "small". In the present proof, this channel plays 
the role that the type-1 channel played in the proof of the direct half 
of Theorem 2. Since a 2 = 0, we may write a channel input as x = 
(x (1) ,x <3> ). Corresponding to x we construct an input signal for our time- 
continuous channel as 



V At t=i V Afc + 2»'(i-«)rJ 



Ko-ft. t »»m + z 



. (72) 



where the ^a are PSWF's, the X* the associated eigenvalues (40), and 
D T the time-truncation operator (14). Equation (72) replaces (52) 
in the proof of Theorem 2. It is easily verified that signals of the form 
(72) belong to a,i(T, W, P ) as defined by (66). If we write the noise 
in the expansion of (46) we can, as in Theorem 2, establish the equiva- 
lence of the time-discrete and time-continuous channels, and establish 
the direct-half of Theorem 3. The weak converse is proved in a similar 
manner, the jointly-constrained product channel employed here being 
of type-2 with parameters 

ai = 2W(l - 8), a 2 = 4W8, a 3 = oo, P = P , 
N N 



2 ' * 1 - <r ' 

where again 8,£,a > are "small". 

Theorem 4: For the direct-half we consider a type-3 jointly-constrained 
product channel with parameters 

ai = 2W(1 - 5), « 2 = «, P = Po, N = ^? , f) = v - °. (74) 

The signals are constructed from vectors x as in (72). For the converse 
we use a type-3 channel with parameters 
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<*- 2TF(l + a), At--, P = P , AT=^°, 1-=-^-. (75) 

^ 1 — tr 

APPENDIX A 

Proof of Theorem 1 

We shall give a proof of Theorem 1 for the type-1 jointly-constrained 
product channel only. The proofs for types 2 and 3 are similar. 
The proof as usual is in two parts. 

A.l Direct Half 

We set Pi = (1 — /3)P, Pi = &P and consider codes for the ordinary 
product channel (Section 4.1). If (30) is satisfied for all code words 
with these values of Pi and P 2 , then the joint constraint (33a) is also 
satisfied. Further since /? ^ fj, (33b) is also satisfied. Hence the direct 
half of Lemma A for the ordinary product channel implies that any 
rate less than &(Pi) + C 2 (P 2 ) = &((1 - 0)P) + C^P) is per- 
missible, and the direct-half of Theorem 1 follows. 

A. 2 Converse 

Let us define C* = d((l - 0)P) + &(pP). We must show that 
any rate R > C* is not permissible. Let us assume the contrary, i.e.; 
for some R = C* + e(e > 0), there exists a sequence of numbers 
{ Ti) ,=i°° where Ti —> <» as i — > °o , and a corresponding sequence of codes 
for the jointly constrained product channel (satisfying (33a) and 
(33b), with parameters fj and P); with the ith. code (i = 1,2, • • • ) 
having parameter T = Ti and e RTi code words, and error probability 
P e = P e (i) where P e (0 -* as i : -> oo . 

Since C x (.r) is uniformly continuous on the closed interval [0,P], 
let us choose an integer J a (sufficiently large) so that 

CM - Cx U - *j\ I < | , ^ .r ^ HP. (76) 

We now partition the ith code (i = 1,2, •• • ) into J classes Si(j) (J = 
1,2, • • • , Jo)- A code word (x (1) ,x (2) ) in the ith code will belong to the 
/th class Si(j), according as the energy of its second component satisfies 

(j - i) I™ < g [ Xk Vf ^ WW , j = 1,2, ■ • • , J . (77) 

Since x <2> satisfies (33b), each code word belongs to exactly one class. 
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(To be precise, we assign code words for which the energy in x 2 is 
zero to class Si(l).) 

For each i(i = 1,2, ■••)> let &'* be the subcode of the ith. code 
(with parameter T = 7\-) consisting of the class Si (J) (j = 1,2, • • • , J ) 
containing the most members. Since Si is the largest class in a partition 
of a code with e RTi code words into J classes, the number of code words 
in S* ^ e RTi /J , so that the corresponding transmission rate for Si is 

R* ^ R - ^r ^ Jo ■ (78) 

■* i 

Further, since S* is a subcode of the ith code (which has error proba- 
bility Pe %) ), the error probability of S* is not more than P e (t) . 

Since there are a finite number (J ) of classes in the partition of the 
ith. code (i = 1,2, • • • ), there must be at least one j„ (1 ^ j e S Jo) 
such that for an infinite number of i, the largest partition Si is the 
j th partition Si(j„). Let (ii,iz, •■•) be the subsequence of i's for 
which S* = Si (jo). Thus the {<S , l< *}<=i°° are a sequence of codes with 
rate R* satisfying (78), and error probability not more than P e (u \ 
where P, w -> as t -* «. Further, if a code word (x (1) ,x (2) ) e S it *, 
it belongs to the class S i( (jo), so that from (77) the energy of the second 
component satisfies 

E(^) = tW i) f^^i, (79) 

fc=l Jo 

and from (77) and (33a), the energy of the first component satisfies 

2?(x (1) ) = E [x k w f £ [l - °'° " 1} ^ 1 PT it . (80) 

We conclude that {£,-,*}t=i M is a sequence of codes which satisfy the 
constraints for the ordinary product channel (30) with parameters 

Pi=[l-{ (jo - l)/Jo\v] P and P 2 = (jo*)/ Jo) P. 

Since the error probability for S,*, P e (u) -* as t -> » , we conclude that 
the rate ft* is a permissible rate for the ordinary product channel. By 
the converse half of Lemma A we have that R* does not exceed the 
capacity of this product channel, i.e., 

R* ^ Cx (U - ^-^ <?)?) + C 2 (& p) , (81) 

where d(x)(i = 1,2) is defined by (37b). Applying (76) to (81) we 
obtain 
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R* ^ Ci((l - 8)P) + C 2 (5P) + | , (82) 

where 5 = j r)/J • Now it follows immediately (by differentiation) from 
the definition of C, (x) and C t (x) that if a 2 ^ on , f(8) A&((1 - 5)z) + 
C 2 (8x) is an increasing function for 5 for 8 < a 2 /(ai + "2), and /(8) 
is a decreasing function of 5 for 5 > a 2 (ai -f a 2 ). We conclude that since 
5 = (jo/Jo)v ^ V, 

d((l - B)P) + C(8P) ^ d((l - fi)P) + C 2 (0P) = C*, (83) 

where p = minft, 09/ (a, + as)). Combining (78), (82) and (83), 
we obtain 

R ^ C* + | + ^-lnJ„. (84) 

If we let t — * oo, then T <t — * 00 and have from (84) 

But R = C* -\- e, and the contradiction establishes the weak converse 
to Theorem 1. 

APPENDIX B 

Proof of Theorem 5 

Theorem 5: Let a(T,W,P ) be the set of all s(t) satisfying 

(t) 8(t) = 0, \t\ > T/2, (85a) 

(*') l| s || 2 ^ P.T, (85b) 

(m) K B (8frrW) £1 - V (0 < i) < 1). (85c) 

Le< //?e Gaussian noise be additive with spectral density N(io) where 



I 



iV(w)rfw = N < 00 (86) 

«/— * 

ZVien C, = 00 (a// q). 

Proof: Let S > and e > 0, and 17(0 < 17 ^ 1) be specified and fixed. 
We shall construct a code satisfying (85) with M = e RT code words 
with error probability P, ^ e . 

To begin with let us choose T sufficiently large so that 
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1 



V±*RT 



^ e, (87a) 



v 



Xi^l-|, (87b) 

where Xi is the first PSWF eigenvalue (40). With T fixed we now con- 
struct the code. 

Let us expand the noise in a series of PSWF's 

oo i /• i\ m rp 

•■• ( °-5*v5r' -a*'*»- (88) 

where 



f r/2 Mt) 

z(i) T7= dt, (89) 

J-T/2 V Afc 



,r/2 ,.,*(*) 

and the {zk}k=\° are Gaussian random variables with mean zero, but 
not necessarily independent. 
Now from (86) we have 

8 ( z(t)dt = NT, (90) 

J—T/2 

where "8" denotes expectation. From the orthogonality of the PSWF's 
(41) we have from (88) 

NT = 8 T /2 z\t)dt = Z 8(* fc 2 ). (91) 

J-T/2 k=\. 

Thus we can find an integer K sufficiently large so that 

8W) =g ^ , i - 1,2, • • • , M. (92) 

With K so chosen, let the M code words be 

i = 1,2, ••■,!/ 

Let us first verify that s,(i), as given by (93), satisfies (85). Equa- 
tion (85a) follows from the definition of D T (14). From the orthogonality 
of PSWF's (41) we have 

|| Si 1| 2 = (i - |) P ° T + I P » T = W (94) 



(93) 
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so that (85b) is satisfied. Finally, 

p t (1 - 1) \i + 1 p rx* +l 

tf fl (s,-, 27rTF) = ^ "{, | |8 

II Si || 

(95) 



'gO-iJ^O-iX 1 -^) 



^ l - v, 

where the next to last inequality follows from (87b). Thus 8i(t)e 
a(T,W,P ). It remains to show that P e ^ e. 

We can write the received signal y(t) in a Fourier series in PSWF's 

y(0 = &(0 + *(0 = Y,y k -7=-, (96) 

A=l V Aft 

where the ?/* are recoverable from y(t) by integration. Say that the 
receiver disregards all the y k except y K +\ , Vk+2 , • • • , 2/k+jit • We may 
write 

*c +i + j/|P„T, i-* (tj7) 

= 1,2, • • • , M). 

If 2/ K+ i is the maximum of the {y K +j}j=i M , the receiver decodes ?/(<) 
as Si(t). Thus if code word i is transmitted, the error probability is 



Vk+j 



Pei = Pr U U K+ y > **+< + j/| PoTJ 
g MP r \z K +j ~ Z K+i > |/| P Tj • 



(98) 



Now Zk+j — z K +i is Gaussian, with mean zero and variance 
S,«z K+j - z K+i f) ^ [E{z K+ f) h + E{z K+ m 

VPo 



(99) 



- 4ft ' 



where the last inequality follows from (92). Thus (98) becomes 

P e :g 71/ erf (- y/2RT), (100) 

where 
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■^-■C^"*"* 



is the cumulative error function. Since erf (—a;) ^ e 2 /(v 2tx), 
(100) yields with the help of (87a) 

qRTq—RT 1 

p - s vm< - vssf £ £ - (101) 

Thus the theorem is proven. 

appendix c 

In this appendix we verify inequality (22) 



w^^-V 'sr' ao2) 

where K B is defined by (13a) and K u ' by (21). Let/(0 be a function 
with Fourier transform F(u), and define the operator B by 

9 = Bf, (103a) 

where 

-• f.2r\V 

9(t) = x- / F(o»)e M da. (103b) 

•G7T •'-2 ir IK 

Thus Bf is the result of passing / through an ideal low-pass filter with 
bandpass W cycles per second. Then 

KaiffrW) = IM. (104) 

Say that x(t) = 0, 1 1 | ^ T/2 and || x || 2 < «>. We assume that we 
may write x = D T x, where £ is bandlimited to W cycles per second. 
(If we cannot then K B '(x,2irW) = 0, and (102) follows immediately.) 
Let us write 

x(t) = x(t) + y(t), (105) 

where ?/(0 = 0, 1 1 \ ^ T/2. Then 

11^112 II l|2 i II ||2 /i r\n \ 

Mil = Mil + II y II i 0- 06 ) 

and from the definition of K B , 

K b '(x,2tW) = Iff. (107) 
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Hence, from (107) and (106), 

|| y || 2 1 - K b '(x,2tW) 



(108) 



(109) 



||z || 2 K b '(x,2tW) " 

Now, since x is bandlimited, Bx = x and we have 
|| x || 2 = || Bx || 2 = || Bx + By || 2 £ [ || Bx || + || % || ] 2 

^ [ II Bx || + || y || ] 2 = || Bx || 2 + || 2/ 1| 2 + 2 || &c || || y 
Combining (106) and (109) we have 

II* IP + II 2/ II 2 = II* II 2 ^ II&F+ II V II 2 + 2 || Bx II || J/ 1|, (110) 
so that (from (104)) 

z^rin-^fei-al^lfMfti-illl. (in) 

Finally, from (108) and (111) we have 



^^-Y 'igg' ' (112) 

This is inequality (102). 

APPENDIX D 

The Capacity of Model 4 

To establish the capacity of the channel denned by Model 4 we must, 
as always, prove a direct-half and converse. In this appendix we give 
an outline of the proof of the direct-half, and a remark about the proof 
of the converse. 

D.l Direct-Half 

Let R < W In [1 -f (P /N W)] be given. We show here that for v 
sufficiently small we may construct codes for Model 4 with rate R and 
with vanishing error probability (as T -* °o ). By the continuity of the 
"In" function we may find a 5 > 0, a > sufficiently small so that 

g < W(l - 5) h [l + N %-_ a) s) ] - 0*. ("3) 

We observe that C* is the capacity of a single time-discrete channel 
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(Section 2.1) with parameters 

P = P (l -a), N = No/2, a = 2W(1 - 8). (114) 

Since R < C*, we can find a code e = {x.-h-i* for this time-discrete 
channel (so that J0(x<) ^ P„(l - a)T) with M = e RT code words, 
and with error probability given that x, is transmitted (i = 1,2, • • • , 
M) (using the minimum distance decoder) 

P ei = Pr U [<*,(*, y) > d,(xj,y)] 

(115) 
-0r+o(r) 



Pr U [|| z *||>^] = «f 



where y is the received vector, d B (u,v) is the Euclidean distance be- 
tween n-vectors u and v, d u = d B (Xi , *§), z ,J is the projection of the 
noise vector z on the line passing through code words x, and Xj , and 
|| u || = [2£(u)]* is the square root of the sum of the squares of the 
components of u. The exponent /3 has been estimated by Shannon. 3 
Since || z ,y || is a Gaussian random variable with mean zero and variance 
N /2 we may lower bound P et by 



Pei ^ 



where 



*ft[u"i>^--(-^). 



(116) 
(j = 1,2, • • • , M j *i) 



edx = L^h e ~ umdu 



\/2i 
is the cumulative error function. Since for large x, 

(115) and (116) yield for large T 

dif ^ 40N,T, ij = 1,2, ■■■ ,M i*j. (117) 

From the code e, let us construct a new code C = {x^m* where 

x i = T ±— x i} i=l,2,---,M. (118) 

1 — a 

Thus the members x, of e satisfy 

E(±t) ^ PoT. (119) 

Let us now assume that there are two noises in the channel, i.e.. the 
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noise vector z = Zi + z 2 . The first noise Zi is the usual spherical Gaussian 
noise (with variance N /2), and the second Z2 is an unknown w-vector 
(n = aT = 2W(l — 8)T) for which we require only 

E(z 2 ) ^ vNoW(l - S)T = u^n. (120) 

We place no other restrictions on the probability structure of z 2 . The 
vector Z2 may depend on the code e, the code word transmitted and 
the value of Zi . The noise vector Zi corresponds to the noise function 
Zi(t) in Model 4, and the noise vector z 2 corresponds to z 2 (t) in Model 4. 
If we use 6 on the time-discrete channel with this noise and use the 
minimum distance decoder, we have an error probability given that 
£, is transmitted 



P ei = Pr jjfd.*(±i,Y) > d.(±y,y)] 



" 1 U21) 
= Pr U 111(2! + z 2 ) ,y || >|-'J 



where 



hi = d B (ti,±j) = diJ 



{I -a)' 

Now since " || || " is a norm 

II (z, + z 2 )'i ^ II zx^'ll + II ft* II ^ II Zl 'i + y/vNoWO- ~ *)T. (122) 

Thus the event 



[|| (* + *)* II >**] 



Zl * 7|l> 2a^)- v * T7(1 - 5)r ]' 



(123) 



where "C denotes set inclusion. Now we would like to say that the 
right member of (123) 

[|| * W II > (1 l ,7 g)2 " V*TF(1 - «Jr] C [ll Zl " II > **] . (124) 

If this is so, then £ ei ^ P P1 -* as T -> w. In fact (124) is satisfied if 

t = 2<r^o " V * 1F{1 " *> r - (125) 



or 
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* ™- -w(r=".> (126) 



_J^ 

•\/4AW(l - 5)7 
Now from (117), &,- ^ V^N T so that if 



1/ 



$*/ ..,/ .x l«-^l, (127) 



w(l - 5) \1 - a 

(126) is satisfied. Hence P ei -^ 0. 

If we now make the same correspondence between the time-con- 
tinuous channel and the time-discrete channel which was made in the 
proof of Theorem 3, we deduce the existence of codes for Model 4 
[with rate R < W In [1 + (P /N W)]\ with P e -*■ as T -> co (pro- 
vided v is sufficiently small — the choice of v depending on W, P /N , 
and R). Note that this construction was done for any 77. Thus we have 
shown in effect that the capacity of Model 4 is 

C = C, ^ W In (l + ^) + ei(v), (128) 

where £i(v) —> as v — ► independent of 77. 

D.2 Converse 

The proof of the converse also parallels the proofs of the converse 
halves of Theorems 2, 3, and 4. However, since the noise may depend 
on the entire code and decoding scheme used (which is not the usual 
assumption of information theory coding theorems), it is necessary to 
go back and re-prove Theorem 1 (which in turn depends on Lemma 
A) for this new situation. Although this task is not a terribly difficult 
one it is rather tedious and we shall side step this chore here. It will 
suffice to state the version of Lemma A which is required here and to 
leave the rest of the proof to the interested reader. 

Lemma A' ': Let us say that we are given time-discrete channel as defined 
in Section I (with parameters a,P) where the noise vector is z = z t + z 2 
where Z\ is the usual spherical Gaussian noise with variance N and z« 
is an unknown vector for which require only 

Efa) ^ ZT. (129) 

We place no other restriction on the probability structure of z 2 . The noise 
vector z 2 may depend on the entire code and decoding scheme, the code word 
transmitted and the value of Z\ . We define the error probability P e as we 
did in {26) for Model 4 and do likewise for the capacity. Let C(a,P,N,£) 
be the capacity of this channel. 
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Now consider the product of r time-discrete channels as in Section J/..1 
with parameters (a,- , P, , N { ) i = 1,2, • • • , r. Here too, we assume a 
second noise vector 

z 2 = (z 2 a, ,z 2 (2) , •••,z 2 (r) ), (130) 

which is unknown but must satisfy 

£,E(z 2 {i) ) g£T, (131) 

and as above may depend on the entire code and decoding scheme, the code 
word transmitted, and the values of the spherical Gaussian noises. 
Lemma A' states that the capacity C of this channel satisfies 

C* ^ EC(a,-,P 1 -,iV 1 -,T^), (132a) 

i=i 

where 

E 7 , = l. (132b) 

i=i 

APPENDIX E 

Equivalence of Time-Discrete and Time-Continuous Models 

In this appendix, we give some details on the validity of the equiva- 
lence of the time-discrete and time-continuous channel models which is 
the key to the proofs of our capacity theorems. 

To begin with, let us consider the direct-half of our theorems. In these 
proofs we deduce the existence of time-continuous coding and decoding 
schemes from the existence of time-discrete coding and decoding schemes. 
To be specific let us consider the proof of the direct half of Theorem 2. 
We may omit the reference to the Karhunen-Loeve expansion (5.10) 
and consider the received signal y{t) = s,(0 + z(t). Now it follows 
from Loeve (Ref. 9, p. 472, A) that 

.T/2 f.T/2 

8 / z{t)dt = \ R(0)dt = N WT < °o, (133) 

J—T/2 J— T/2 

so that with probability 1, z(t) and, therefore, y(t) is square-in tegrable. 
It then follows that the integrals 

• 7'/ 2 



y* m 



l r 1 - 

= ~7r / y(t)Mt)dt and 

VAfc J-T/2 

V™ = a [ y(t)* ai T + k(t)dt (134) 

V AaiT+A- J- T/2 
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(where fail) and the \ k are the fcth PSWF and eigenvalue, respectively) 
exist for all k with probability 1. Further, it follows directly on sub- 
stituting y (t) = Si(t) + z(t) into (134) that 

y k U) = s ik + *« i = 1,2, (135) 

where the z k %) are independent normally distributed random variables 
with mean zero and variance N /2. Thus, the decoder for the time- 
continuous code may obtain the y k x) from the y(t) and make use of the 
decoding scheme for the time-discrete code and obtain the same error 
probability. Hence, the direct-half of this and the subsequent theorems 
is valid. 

Let us now consider the converse half of our theorems. In each of 
these proofs we assume that for a fixed rate R exceeding capacity, we 
are given a sequence of codes for the time-continuous channel with error 
probability P e . We must show that P e is bounded away from zero. To 
do this we deduce the existence of a corresponding sequence of codes 
with rate R and error probability P e for a time-discrete channel with the 
same capacity as the time-continuous channel. Since we can invoke a 
converse for this time-discrete channel (Theorem 1), we then conclude 
that P e is bounded away from zero. We will now show how to make 
this correspondence precise. Again let us refer specifically to the proof 
of Theorem 2, the others following similarly. 

Let {s.^OJ.^i^ be the code for the time-continuous channel, and 
x = (x (1) ,x <2> ) be the corresponding input to the time-discrete (product) 
channel. Further, we may write the noise signal z(t) and the received 
signal y(t) in Fourier series in PSWF's where, as above, all the coordi- 
nates are finite with probability 1. We then let z = (z ' ,z 2 ) and y = 
(y (I \y <2) ) be the vectors whose coordinates are the coefficients in these 
expansions. We can easily show that 

y = x + z, (136) 

where the coordinates of z are independent random variables with mean 
zero and variance N /2. Thus, we have established the correspondence 
of the time-continuous and time-discrete channels and codes. We must 
now show that the time-continuous and time-discrete codes have the 
same error probability. In other words, we mush show that there exists 
a decoding scheme for the y which has the same error probability as the 
decoding scheme for the continuous received signal y(t). We proceed 
as follows: 

Let (B be the usual (Kolmogorov) o-algebra on £ 2 [— T/2,T/2\, i.e., 
(B is the a-algebra generated by the "intervals" of the form 
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lw(0:y(O ^ Pi,y(h) ^ P2, ••• ,y(0 ^ Pn}. 



M 

1 I 



Corresponding to the code for the time-continuous channel js t (0}i= 
we define M probability measures Pi , P 2 , • • • , Pm on (B as follows. 
If B 6 <B, then 

Pi(B) = Prob [(s,(0 + z(t)) 6 (B], (137) 

where the probability in (137) is computed for z(t), a noise sample 
function. A decoding-coding rule for this code is a set of M disjoint 
A, : € (B (i = 1,2, • • • , M), called decoding regions. The error probability 
given that s,(i) is transmitted is 

P ei = 1 - Pi(Ai). (138) 

Now let (B £ (B be the sub-tr-algebra on £ 2 [—T/2,T/2], consisting 
of those sets determined by the coefficients of a representation of a 
function in PSWF's. That is, if 7/(0 6 £ 2 (-T/2,T/2), let 



1 r T ' 2 1 f T/2 

= ~7r / y(l)4>Mdt and y k m = -j= \ y(.t)** lT +k(t)dt. 



• T/2 
V Afc J-T/2 

Then <B is the <r-algebra generated by intervals of the form 

A decoding rule for a time-discrete code with M code words is a set of 
M disjoint A; 6 <B (i = 1,2, • • • , M) (decoding regions), and the 
error probability given that vector x, (x, is the representation of 8t(t) 
in PSWF's) is transmitted is 

P ei = 1 - Pi{h). 

Kadota [Ref. 10, Appendix D] has shown that for each A,- 6 CB, there 
exists a A; £ (B such that 

P(A,AA 1 ) = 0, 

where A denotes "symmetric difference". Thus, if { A,} u=\ M are the de- 
coding regions for a time-continuous code we can find a set {A,- 6 CB},=i 
of decoding regions for the corresponding time-discrete code such that 
the error probabilities Pei = Pei . 

We conclude that the error probability for the time-discrete code 
equals the error probability for the time-continuous code, and the 
converse is valid. 
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GLOSSARY 

The following symbols are used throughout the paper: 

M = the number of members of a code. 
T = time required to transmit a code word. 
R = (1/T) In M = transmission rate in nats per second. 
C = channel capacity = maximum "error free" rate. 
P ei = probability that the receiver makes an incorrect decoding 
decision when code word i is transmitted (i = 1,2, • • • , M ). 
P e = (1/Af ) J2i"i Pei = over-all error probability. 
&(X) = expected value of the random variable X. 
\pk , a* = fcth prolate spheroidal wave function (PSWF) and eigen- 
value respectively (k = 1,2, • • • ). 
The following symbols are used in connection with time-discrete or 
time-continuous channels : 

Time-Discrete Channels: 

x,y,z = input, output, and noise vectors, respectively. 
n = aT = dimension of above vectors, so that a is the rate at 
which the channel passes real numbers. 
E(x) = sum of the squares of the coordinates of the vector x. 
P = parameter constraining E(x) (x is channel input). 
N = variance of the normally distributed noise. 
r = number of components in the product (or parallel 
combination) of channels. 
x u) , y (,) , z = input, output, and noise vectors, respectively for the 
ith component of a product of channels (i = 1,2, • • • , r). 
fii, on, Pi, Ni = parameters n,a,P,N, respectively, for the ith. compo- 
nent of a product of channels (i = 1,2, • • • , r). 
% = parameter constraining the relative values of E(x {l) ) 
in the product of channels. 

Time-Continuous Channels: 

s(t),y(t),z(t) = input, output, and noise signals, respectively. 
S(w) = Fourier transform of s(t). 



= / s{t)dt = "energy" of s(t). 

J— 00 

,(s,2tW) = i- [ * | 8(a) | 2 da/ 



K b (s,2tW) = jf. / j S(a) T da/ \\ s \\' 

ATT »'-2jrIf 

= (energy) concentration in frequency band 0-W cps. 
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fr/2 
s{t)dt/ 1| s || 2 = (energy) concentration in time 
r/2 

interval [-(T/2), (T/2)]. 

K b '(s,2tW) = an alternate measure of frequency concentration 
denned by (21). 
D T = operator which truncates a signal outside the time 
interval [ - (T/2), (T/2)] (see (14)). 
£■> [—T/2, T/2] = the space of square integrable functions defined on 
[-772,772]. 
W = bandwidth of channel. 
P = average "power" of input signals. 
N = one-sided spectral density of noise z(t). 
a = a,i(T,W,P ) = set of allowable channel input signals 
(for Model i, i = 1,2,3,4). These signals are approxi- 
mately time-limited to T sees, approximately band- 
limited to W cps, and have energy not exceeding P T. 
t\ = parameter which measures the extent to which signals 

in o are not strictly time or bandlimited. 
v = parameter which measures the extent to which the 
noise spectral density is not zero for | co \ > 2irW. 
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